Reinforcement Learning for MDPs with Constraints

نویسنده

  • Peter Geibel
چکیده

In this article, I will consider Markov Decision Processes with two criteria, each defined as the expected value of an infinite horizon cumulative return. The second criterion is either itself subject to an inequality constraint, or there is maximum allowable probability that the single returns violate the constraint. I describe and discuss three new reinforcement learning approaches for solving such control problems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Convergent Reinforcement Learning for Hierarchical Reactive Plans

Hierarchical reinforcement learning techniques operate on structured plans. Although structured representations add expressive power to Markov Decision Processes (MDPs), current approaches impose constraints that force the associated convergence proofs to depend upon a subroutinestyle execution model that restricts adaptive response. We develop an alternate approach to convergent learning that ...

متن کامل

Uncertain Reward-Transition MDPs for Negotiable Reinforcement Learning

Uncertain Reward-Transition MDPs for Negotiable Reinforcement Learning

متن کامل

Multiple-Goal Reinforcement Learning with Modular Sarsa(O)

We present a new algorithm, GM-Sarsa(O), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processes. According to our formulation different sub-goals are modeled as MDPs that are coupled by the requirement that they share actions. Existing reinforcement learning algorithms address similar problem formulations by fir...

متن کامل

Reinforcement Learning in Large or Unknown MDPs

Reinforcement Learning in Large or Unknown MDPs

متن کامل

A Generalized Reinforcement-Learning Model: Convergence and Applicationa

Reinforcement learning is the process by which an autonomous agent uses its experience interacting with an environment to improve its behavior. The Markov decision process (mdp) model is a popular way of formalizing the reinforcement-learning problem, but it is by no means the only way. In this paper, we show how many of the important theoretical results concerning reinforcement learning in mdp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006